{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Working with Multiple DataFrames\n",
    "<hr style=\"height:0.6px;border:none;color:#666;background-color:#666;\" />\n",
    "\n",
    "Often you'll work with multiple dataframes that you want to stick together or merge. `df.merge()` and `df.concat()` are all you need to know for combining dataframes. The Pandas [documentation](https://pandas.pydata.org/pandas-docs/stable/user_guide/merging.html) is very helpful for these functions, but they are pretty easy to grasp.\n",
    "\n",
    "```{note}\n",
    "The example joins shown in this section are inspired by [Chapter 15](https://stat545.com/join-cheatsheet.html) of Jenny Bryan's STAT 545 materials.\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Sticking DataFrames Together with `pd.concat()`\n",
    "<hr>\n",
    "You can use `pd.concat()` to stick dataframes together:\n",
    "- Vertically: if they have the same **columns**, OR\n",
    "- Horizontally: if they have the same **rows**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "\n",
    "df1 = pd.DataFrame({'A': [1, 3, 5],\n",
    "                    'B': [2, 4, 6]})\n",
    "df2 = pd.DataFrame({'A': [7, 9, 11],\n",
    "                    'B': [8, 10, 12]})"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>A</th>\n",
       "      <th>B</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>3</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>5</td>\n",
       "      <td>6</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   A  B\n",
       "0  1  2\n",
       "1  3  4\n",
       "2  5  6"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df1"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>A</th>\n",
       "      <th>B</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>7</td>\n",
       "      <td>8</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>9</td>\n",
       "      <td>10</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>11</td>\n",
       "      <td>12</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    A   B\n",
       "0   7   8\n",
       "1   9  10\n",
       "2  11  12"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>A</th>\n",
       "      <th>B</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>3</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>5</td>\n",
       "      <td>6</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>7</td>\n",
       "      <td>8</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>9</td>\n",
       "      <td>10</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>11</td>\n",
       "      <td>12</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    A   B\n",
       "0   1   2\n",
       "1   3   4\n",
       "2   5   6\n",
       "0   7   8\n",
       "1   9  10\n",
       "2  11  12"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.concat((df1, df2), axis=0)  # axis=0 specifies a vertical stick, i.e., on the columns"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Notice that the indexes were simply joined together? This may or may not be what you want. To reset the index, you can specify the argument `ignore_index=True`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>A</th>\n",
       "      <th>B</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>3</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>5</td>\n",
       "      <td>6</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>7</td>\n",
       "      <td>8</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>9</td>\n",
       "      <td>10</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>11</td>\n",
       "      <td>12</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    A   B\n",
       "0   1   2\n",
       "1   3   4\n",
       "2   5   6\n",
       "3   7   8\n",
       "4   9  10\n",
       "5  11  12"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.concat((df1, df2), axis=0, ignore_index=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Use `axis=1` to stick together horizontally:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>0</th>\n",
       "      <th>1</th>\n",
       "      <th>2</th>\n",
       "      <th>3</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>7</td>\n",
       "      <td>8</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>3</td>\n",
       "      <td>4</td>\n",
       "      <td>9</td>\n",
       "      <td>10</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>5</td>\n",
       "      <td>6</td>\n",
       "      <td>11</td>\n",
       "      <td>12</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   0  1   2   3\n",
       "0  1  2   7   8\n",
       "1  3  4   9  10\n",
       "2  5  6  11  12"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.concat((df1, df2), axis=1, ignore_index=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You are not limited to just two dataframes, you can concatenate as many as you want:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>A</th>\n",
       "      <th>B</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>3</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>5</td>\n",
       "      <td>6</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>7</td>\n",
       "      <td>8</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>9</td>\n",
       "      <td>10</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>11</td>\n",
       "      <td>12</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>3</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>5</td>\n",
       "      <td>6</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>7</td>\n",
       "      <td>8</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>9</td>\n",
       "      <td>10</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>11</td>\n",
       "      <td>12</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     A   B\n",
       "0    1   2\n",
       "1    3   4\n",
       "2    5   6\n",
       "3    7   8\n",
       "4    9  10\n",
       "5   11  12\n",
       "6    1   2\n",
       "7    3   4\n",
       "8    5   6\n",
       "9    7   8\n",
       "10   9  10\n",
       "11  11  12"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.concat((df1, df2, df1, df2), axis=0, ignore_index=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Joining DataFrames with `pd.merge()`\n",
    "<hr>\n",
    "\n",
    "`pd.merge()` gives you the ability to \"join\" dataframes using different rules (just like with SQL if you're familiar with it). You can use `df.merge()` to join dataframes based on shared `key` columns. Methods include:\n",
    "- \"inner join\"\n",
    "- \"outer join\"\n",
    "- \"left join\"\n",
    "- \"right join\"\n",
    "\n",
    "See this great [cheat sheet](https://pandas.pydata.org/pandas-docs/stable/getting_started/comparison/comparison_with_sql.html#compare-with-sql-join) and [these great animations](https://github.com/gadenbuie/tidyexplain) for more insights."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "df1 = pd.DataFrame({\"name\": ['Magneto', 'Storm', 'Mystique', 'Batman', 'Joker', 'Catwoman', 'Hellboy'],\n",
    "                    'alignment': ['bad', 'good', 'bad', 'good', 'bad', 'bad', 'good'],\n",
    "                    'gender': ['male', 'female', 'female', 'male', 'male', 'female', 'male'],\n",
    "                    'publisher': ['Marvel', 'Marvel', 'Marvel', 'DC', 'DC', 'DC', 'Dark Horse Comics']})\n",
    "df2 = pd.DataFrame({'publisher': ['DC', 'Marvel', 'Image'],\n",
    "                    'year_founded': [1934, 1939, 1992]})"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>name</th>\n",
       "      <th>alignment</th>\n",
       "      <th>gender</th>\n",
       "      <th>publisher</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Magneto</td>\n",
       "      <td>bad</td>\n",
       "      <td>male</td>\n",
       "      <td>Marvel</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Storm</td>\n",
       "      <td>good</td>\n",
       "      <td>female</td>\n",
       "      <td>Marvel</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Mystique</td>\n",
       "      <td>bad</td>\n",
       "      <td>female</td>\n",
       "      <td>Marvel</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Batman</td>\n",
       "      <td>good</td>\n",
       "      <td>male</td>\n",
       "      <td>DC</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Joker</td>\n",
       "      <td>bad</td>\n",
       "      <td>male</td>\n",
       "      <td>DC</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>Catwoman</td>\n",
       "      <td>bad</td>\n",
       "      <td>female</td>\n",
       "      <td>DC</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>Hellboy</td>\n",
       "      <td>good</td>\n",
       "      <td>male</td>\n",
       "      <td>Dark Horse Comics</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       name alignment  gender          publisher\n",
       "0   Magneto       bad    male             Marvel\n",
       "1     Storm      good  female             Marvel\n",
       "2  Mystique       bad  female             Marvel\n",
       "3    Batman      good    male                 DC\n",
       "4     Joker       bad    male                 DC\n",
       "5  Catwoman       bad  female                 DC\n",
       "6   Hellboy      good    male  Dark Horse Comics"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df1"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>publisher</th>\n",
       "      <th>year_founded</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>DC</td>\n",
       "      <td>1934</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Marvel</td>\n",
       "      <td>1939</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Image</td>\n",
       "      <td>1992</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  publisher  year_founded\n",
       "0        DC          1934\n",
       "1    Marvel          1939\n",
       "2     Image          1992"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "An \"inner\" join will return all rows of `df1` where matching values for \"publisher\" are found in `df2`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>name</th>\n",
       "      <th>alignment</th>\n",
       "      <th>gender</th>\n",
       "      <th>publisher</th>\n",
       "      <th>year_founded</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Magneto</td>\n",
       "      <td>bad</td>\n",
       "      <td>male</td>\n",
       "      <td>Marvel</td>\n",
       "      <td>1939</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Storm</td>\n",
       "      <td>good</td>\n",
       "      <td>female</td>\n",
       "      <td>Marvel</td>\n",
       "      <td>1939</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Mystique</td>\n",
       "      <td>bad</td>\n",
       "      <td>female</td>\n",
       "      <td>Marvel</td>\n",
       "      <td>1939</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Batman</td>\n",
       "      <td>good</td>\n",
       "      <td>male</td>\n",
       "      <td>DC</td>\n",
       "      <td>1934</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Joker</td>\n",
       "      <td>bad</td>\n",
       "      <td>male</td>\n",
       "      <td>DC</td>\n",
       "      <td>1934</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>Catwoman</td>\n",
       "      <td>bad</td>\n",
       "      <td>female</td>\n",
       "      <td>DC</td>\n",
       "      <td>1934</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       name alignment  gender publisher  year_founded\n",
       "0   Magneto       bad    male    Marvel          1939\n",
       "1     Storm      good  female    Marvel          1939\n",
       "2  Mystique       bad  female    Marvel          1939\n",
       "3    Batman      good    male        DC          1934\n",
       "4     Joker       bad    male        DC          1934\n",
       "5  Catwoman       bad  female        DC          1934"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.merge(df1, df2, how=\"inner\", on=\"publisher\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "![](inner_join.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "An \"outer\" join will return all rows of `df1` and `df2`, placing NaNs where information is unavailable:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>name</th>\n",
       "      <th>alignment</th>\n",
       "      <th>gender</th>\n",
       "      <th>publisher</th>\n",
       "      <th>year_founded</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Magneto</td>\n",
       "      <td>bad</td>\n",
       "      <td>male</td>\n",
       "      <td>Marvel</td>\n",
       "      <td>1939.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Storm</td>\n",
       "      <td>good</td>\n",
       "      <td>female</td>\n",
       "      <td>Marvel</td>\n",
       "      <td>1939.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Mystique</td>\n",
       "      <td>bad</td>\n",
       "      <td>female</td>\n",
       "      <td>Marvel</td>\n",
       "      <td>1939.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Batman</td>\n",
       "      <td>good</td>\n",
       "      <td>male</td>\n",
       "      <td>DC</td>\n",
       "      <td>1934.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Joker</td>\n",
       "      <td>bad</td>\n",
       "      <td>male</td>\n",
       "      <td>DC</td>\n",
       "      <td>1934.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>Catwoman</td>\n",
       "      <td>bad</td>\n",
       "      <td>female</td>\n",
       "      <td>DC</td>\n",
       "      <td>1934.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>Hellboy</td>\n",
       "      <td>good</td>\n",
       "      <td>male</td>\n",
       "      <td>Dark Horse Comics</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Image</td>\n",
       "      <td>1992.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       name alignment  gender          publisher  year_founded\n",
       "0   Magneto       bad    male             Marvel        1939.0\n",
       "1     Storm      good  female             Marvel        1939.0\n",
       "2  Mystique       bad  female             Marvel        1939.0\n",
       "3    Batman      good    male                 DC        1934.0\n",
       "4     Joker       bad    male                 DC        1934.0\n",
       "5  Catwoman       bad  female                 DC        1934.0\n",
       "6   Hellboy      good    male  Dark Horse Comics           NaN\n",
       "7       NaN       NaN     NaN              Image        1992.0"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.merge(df1, df2, how=\"outer\", on=\"publisher\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Return all rows from `df1` and all columns of `df1` and `df2`, populated where matches occur:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>name</th>\n",
       "      <th>alignment</th>\n",
       "      <th>gender</th>\n",
       "      <th>publisher</th>\n",
       "      <th>year_founded</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Magneto</td>\n",
       "      <td>bad</td>\n",
       "      <td>male</td>\n",
       "      <td>Marvel</td>\n",
       "      <td>1939.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Storm</td>\n",
       "      <td>good</td>\n",
       "      <td>female</td>\n",
       "      <td>Marvel</td>\n",
       "      <td>1939.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Mystique</td>\n",
       "      <td>bad</td>\n",
       "      <td>female</td>\n",
       "      <td>Marvel</td>\n",
       "      <td>1939.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Batman</td>\n",
       "      <td>good</td>\n",
       "      <td>male</td>\n",
       "      <td>DC</td>\n",
       "      <td>1934.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Joker</td>\n",
       "      <td>bad</td>\n",
       "      <td>male</td>\n",
       "      <td>DC</td>\n",
       "      <td>1934.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>Catwoman</td>\n",
       "      <td>bad</td>\n",
       "      <td>female</td>\n",
       "      <td>DC</td>\n",
       "      <td>1934.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>Hellboy</td>\n",
       "      <td>good</td>\n",
       "      <td>male</td>\n",
       "      <td>Dark Horse Comics</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       name alignment  gender          publisher  year_founded\n",
       "0   Magneto       bad    male             Marvel        1939.0\n",
       "1     Storm      good  female             Marvel        1939.0\n",
       "2  Mystique       bad  female             Marvel        1939.0\n",
       "3    Batman      good    male                 DC        1934.0\n",
       "4     Joker       bad    male                 DC        1934.0\n",
       "5  Catwoman       bad  female                 DC        1934.0\n",
       "6   Hellboy      good    male  Dark Horse Comics           NaN"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.merge(df1, df2, how=\"left\", on=\"publisher\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "![](left_join.png)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>name</th>\n",
       "      <th>alignment</th>\n",
       "      <th>gender</th>\n",
       "      <th>publisher</th>\n",
       "      <th>year_founded</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Batman</td>\n",
       "      <td>good</td>\n",
       "      <td>male</td>\n",
       "      <td>DC</td>\n",
       "      <td>1934</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Joker</td>\n",
       "      <td>bad</td>\n",
       "      <td>male</td>\n",
       "      <td>DC</td>\n",
       "      <td>1934</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Catwoman</td>\n",
       "      <td>bad</td>\n",
       "      <td>female</td>\n",
       "      <td>DC</td>\n",
       "      <td>1934</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Magneto</td>\n",
       "      <td>bad</td>\n",
       "      <td>male</td>\n",
       "      <td>Marvel</td>\n",
       "      <td>1939</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Storm</td>\n",
       "      <td>good</td>\n",
       "      <td>female</td>\n",
       "      <td>Marvel</td>\n",
       "      <td>1939</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>Mystique</td>\n",
       "      <td>bad</td>\n",
       "      <td>female</td>\n",
       "      <td>Marvel</td>\n",
       "      <td>1939</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Image</td>\n",
       "      <td>1992</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       name alignment  gender publisher  year_founded\n",
       "0    Batman      good    male        DC          1934\n",
       "1     Joker       bad    male        DC          1934\n",
       "2  Catwoman       bad  female        DC          1934\n",
       "3   Magneto       bad    male    Marvel          1939\n",
       "4     Storm      good  female    Marvel          1939\n",
       "5  Mystique       bad  female    Marvel          1939\n",
       "6       NaN       NaN     NaN     Image          1992"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.merge(df1, df2, how=\"right\", on=\"publisher\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "There are many ways to specify the `key` to join dataframes on, you can join on index values, different, column names, etc. Another helpful argument is the `indicator` argument which will add a column to the result telling you where matches were found in the dataframes:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>name</th>\n",
       "      <th>alignment</th>\n",
       "      <th>gender</th>\n",
       "      <th>publisher</th>\n",
       "      <th>year_founded</th>\n",
       "      <th>_merge</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Magneto</td>\n",
       "      <td>bad</td>\n",
       "      <td>male</td>\n",
       "      <td>Marvel</td>\n",
       "      <td>1939.0</td>\n",
       "      <td>both</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Storm</td>\n",
       "      <td>good</td>\n",
       "      <td>female</td>\n",
       "      <td>Marvel</td>\n",
       "      <td>1939.0</td>\n",
       "      <td>both</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Mystique</td>\n",
       "      <td>bad</td>\n",
       "      <td>female</td>\n",
       "      <td>Marvel</td>\n",
       "      <td>1939.0</td>\n",
       "      <td>both</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Batman</td>\n",
       "      <td>good</td>\n",
       "      <td>male</td>\n",
       "      <td>DC</td>\n",
       "      <td>1934.0</td>\n",
       "      <td>both</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Joker</td>\n",
       "      <td>bad</td>\n",
       "      <td>male</td>\n",
       "      <td>DC</td>\n",
       "      <td>1934.0</td>\n",
       "      <td>both</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>Catwoman</td>\n",
       "      <td>bad</td>\n",
       "      <td>female</td>\n",
       "      <td>DC</td>\n",
       "      <td>1934.0</td>\n",
       "      <td>both</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>Hellboy</td>\n",
       "      <td>good</td>\n",
       "      <td>male</td>\n",
       "      <td>Dark Horse Comics</td>\n",
       "      <td>NaN</td>\n",
       "      <td>left_only</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Image</td>\n",
       "      <td>1992.0</td>\n",
       "      <td>right_only</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       name alignment  gender          publisher  year_founded      _merge\n",
       "0   Magneto       bad    male             Marvel        1939.0        both\n",
       "1     Storm      good  female             Marvel        1939.0        both\n",
       "2  Mystique       bad  female             Marvel        1939.0        both\n",
       "3    Batman      good    male                 DC        1934.0        both\n",
       "4     Joker       bad    male                 DC        1934.0        both\n",
       "5  Catwoman       bad  female                 DC        1934.0        both\n",
       "6   Hellboy      good    male  Dark Horse Comics           NaN   left_only\n",
       "7       NaN       NaN     NaN              Image        1992.0  right_only"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.merge(df1, df2, how=\"outer\", on=\"publisher\", indicator=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "By the way, you can use `pd.concat()` to do a simple \"inner\" or \"outer\" join on multiple datadrames at once. It's less flexible than merge, but can be useful sometimes."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}